Learning to Collaborate from Delayed Rewards in Foraging Like Environments

نویسندگان

Dennis Barrios-Aranibar

Luiz Marcos Garcia Gonçalves

چکیده

Machine learning techniques are usually used in coordination problems and in competitive games but not in collaborative ones. Collaboration and coordination are different, while in coordination the task can not be concluded by a unique agent, in collaboration it can be solved by one agent or by a team, but the use of several agents has to be re ected in the performance of the system. In this work, authors propose the use of in uence value reinforcement learning (IVRL) in collaborative problems and test it into a foraging game. In early work authors show experimentally that, in coordination problems, the IVRL paradigm performs better than the traditional paradigms (independent learning and joint action learning). Thus, in this paper, authors compare their new paradigm (IVRL) with the traditional ones in order to establish if reinforcement learning is well suited to be used in collaboration problems, and shows that the proposed paradigm performs better than the traditional ones.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive intertemporal preferences in foraging-style environments

Decision makers often face choices between smaller more immediate rewards and larger more delayed rewards. For example, when foraging for food, animals must choose between actions that have varying costs (e.g., effort, duration, energy expenditure) and varying benefits (e.g., amount of food intake). The combination of these costs and benefits determine what optimal behavior is. In the present s...

متن کامل

Short-term gains, long-term pains: how cues about state aid learning in dynamic environments.

Successful investors seeking returns, animals foraging for food, and pilots controlling aircraft all must take into account how their current decisions will impact their future standing. One challenge facing decision makers is that options that appear attractive in the short-term may not turn out best in the long run. In this paper, we explore human learning in a dynamic decision making task wh...

متن کامل

The application of temporal difference learning in optimal diet models.

An experience-based aversive learning model of foraging behaviour in uncertain environments is presented. We use Q-learning as a model-free implementation of Temporal difference learning motivated by growing evidence for neural correlates in natural reinforcement settings. The predator has the choice of including an aposematic prey in its diet or to forage on alternative food sources. We show h...

متن کامل

Comparison of Reinforcement and Supervised Learning Methods in Farmer-Pest Problem with Delayed Rewards

In this paper we propose a method based on the time-window idea which allows agents to generate their strategy using supervised learning algorithms in environments with delayed rewards. It is universal and can be used in various environments. Learning speed of the proposed method and reinforcement learning algorithm are compared in a FarmerPest problem with delayed rewards. Farmer-Pest problem ...

متن کامل

Copy-when-uncertain: bumblebees rely on social information when rewards are highly variable

To understand the relative benefits of social and personal information use in foraging decisions, we developed an agent-based model of social learning that predicts social information should be more adaptive where resources are highly variable and personal information where resources vary little. We tested our predictions with bumblebees and found that foragers relied more on social information...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Learning to Collaborate from Delayed Rewards in Foraging Like Environments

نویسندگان

چکیده

منابع مشابه

Adaptive intertemporal preferences in foraging-style environments

Short-term gains, long-term pains: how cues about state aid learning in dynamic environments.

The application of temporal difference learning in optimal diet models.

Comparison of Reinforcement and Supervised Learning Methods in Farmer-Pest Problem with Delayed Rewards

Copy-when-uncertain: bumblebees rely on social information when rewards are highly variable

عنوان ژورنال:

اشتراک گذاری